Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells435432
Missing cells (%)8.1%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 88 (19.7%) missing values Age has 88 (19.7%) missing values Missing
Cabin has 346 (77.6%) missing values Cabin has 344 (77.1%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 309 (69.3%) zeros SibSp has 298 (66.8%) zeros Zeros
Parch has 336 (75.3%) zeros Parch has 332 (74.4%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 5 (1.1%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-01-08 15:05:20.5616762024-01-08 15:05:24.828375
Analysis finished2024-01-08 15:05:24.8273092024-01-08 15:05:28.722837
Duration4.27 seconds3.89 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean447.11659447.1704
 Dataset ADataset B
Minimum11
Maximum890891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:28.896547image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile37.2540.5
Q1239.25217.25
median446.5433
Q3666.75689.5
95-th percentile841.75856.75
Maximum890891
Range889890
Interquartile range (IQR)427.5472.25

Descriptive statistics

 Dataset ADataset B
Standard deviation254.98656267.31341
Coefficient of variation (CV)0.570290990.5977887
Kurtosis-1.1483252-1.2855882
Mean447.11659447.1704
Median Absolute Deviation (MAD)212.5233.5
Skewness-0.022023140.031052275
Sum199414199438
Variance65018.14871456.461
MonotonicityNot monotonicNot monotonic
2024-01-08T15:05:29.168276image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
546 1
 
0.2%
849 1
 
0.2%
868 1
 
0.2%
516 1
 
0.2%
888 1
 
0.2%
232 1
 
0.2%
533 1
 
0.2%
421 1
 
0.2%
334 1
 
0.2%
543 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
746 1
 
0.2%
376 1
 
0.2%
204 1
 
0.2%
535 1
 
0.2%
243 1
 
0.2%
57 1
 
0.2%
819 1
 
0.2%
737 1
 
0.2%
195 1
 
0.2%
555 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
6 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
18 1
0.2%
19 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
6 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
18 1
0.2%
19 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
265 
1
181 
0
296 
1
150 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row00
3rd row10
4th row00
5th row11

Common Values

ValueCountFrequency (%)
0 265
59.4%
1 181
40.6%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Length

2024-01-08T15:05:29.372738image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-01-08T15:05:29.518203image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:29.656648image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
0 265
59.4%
1 181
40.6%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring characters

ValueCountFrequency (%)
0 265
59.4%
1 181
40.6%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 265
59.4%
1 181
40.6%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 265
59.4%
1 181
40.6%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 265
59.4%
1 181
40.6%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
243 
1
104 
2
99 
3
252 
1
108 
2
86 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row32
3rd row31
4th row33
5th row23

Common Values

ValueCountFrequency (%)
3 243
54.5%
1 104
23.3%
2 99
22.2%
ValueCountFrequency (%)
3 252
56.5%
1 108
24.2%
2 86
 
19.3%

Length

2024-01-08T15:05:29.804169image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-01-08T15:05:29.952303image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:30.101063image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
3 243
54.5%
1 104
23.3%
2 99
22.2%
ValueCountFrequency (%)
3 252
56.5%
1 108
24.2%
2 86
 
19.3%

Most occurring characters

ValueCountFrequency (%)
3 243
54.5%
1 104
23.3%
2 99
22.2%
ValueCountFrequency (%)
3 252
56.5%
1 108
24.2%
2 86
 
19.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 243
54.5%
1 104
23.3%
2 99
22.2%
ValueCountFrequency (%)
3 252
56.5%
1 108
24.2%
2 86
 
19.3%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 243
54.5%
1 104
23.3%
2 99
22.2%
ValueCountFrequency (%)
3 252
56.5%
1 108
24.2%
2 86
 
19.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 243
54.5%
1 104
23.3%
2 99
22.2%
ValueCountFrequency (%)
3 252
56.5%
1 108
24.2%
2 86
 
19.3%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:30.520063image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6757
Median length5046
Mean length26.52690626.737668
Min length1213

Characters and Unicode

 Dataset ADataset B
Total characters1183111925
Distinct characters6059
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowNicholson, Mr. Arthur ErnestCrosby, Capt. Edward Gifford
2nd rowNosworthy, Mr. Richard CaterNorman, Mr. Robert Douglas
3rd rowde Mulder, Mr. TheodoreRobbins, Mr. Victor
4th rowAsplund, Master. Clarence Gustaf HugoPetterson, Mr. Johan Emil
5th rowRichards, Master. George SibleyAndersen-Jensen, Miss. Carla Christine Nielsine
ValueCountFrequency (%)
mr 264
 
14.7%
miss 98
 
5.5%
mrs 60
 
3.3%
william 30
 
1.7%
henry 21
 
1.2%
master 20
 
1.1%
john 20
 
1.1%
thomas 15
 
0.8%
james 14
 
0.8%
mary 12
 
0.7%
Other values (876) 1243
69.2%
ValueCountFrequency (%)
mr 271
 
15.1%
miss 97
 
5.4%
mrs 53
 
2.9%
william 31
 
1.7%
john 22
 
1.2%
henry 21
 
1.2%
master 17
 
0.9%
james 13
 
0.7%
george 12
 
0.7%
edward 10
 
0.6%
Other values (873) 1252
69.6%
2024-01-08T15:05:31.274428image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1351
 
11.4%
r 974
 
8.2%
e 872
 
7.4%
a 814
 
6.9%
s 655
 
5.5%
i 653
 
5.5%
n 645
 
5.5%
M 572
 
4.8%
l 514
 
4.3%
o 459
 
3.9%
Other values (50) 4322
36.5%
ValueCountFrequency (%)
1354
 
11.4%
r 972
 
8.2%
e 856
 
7.2%
a 813
 
6.8%
i 672
 
5.6%
n 661
 
5.5%
s 643
 
5.4%
M 558
 
4.7%
l 533
 
4.5%
o 495
 
4.2%
Other values (49) 4368
36.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7585
64.1%
Uppercase Letter 1811
 
15.3%
Space Separator 1351
 
11.4%
Other Punctuation 949
 
8.0%
Open Punctuation 66
 
0.6%
Close Punctuation 66
 
0.6%
Dash Punctuation 3
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 7695
64.5%
Uppercase Letter 1806
 
15.1%
Space Separator 1354
 
11.4%
Other Punctuation 945
 
7.9%
Close Punctuation 60
 
0.5%
Open Punctuation 60
 
0.5%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1351
100.0%
ValueCountFrequency (%)
1354
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 974
12.8%
e 872
11.5%
a 814
10.7%
s 655
8.6%
i 653
8.6%
n 645
8.5%
l 514
 
6.8%
o 459
 
6.1%
t 321
 
4.2%
h 261
 
3.4%
Other values (16) 1417
18.7%
ValueCountFrequency (%)
r 972
12.6%
e 856
11.1%
a 813
10.6%
i 672
8.7%
n 661
8.6%
s 643
8.4%
l 533
 
6.9%
o 495
 
6.4%
t 322
 
4.2%
d 252
 
3.3%
Other values (16) 1476
19.2%
Uppercase Letter
ValueCountFrequency (%)
M 572
31.6%
A 119
 
6.6%
H 106
 
5.9%
J 103
 
5.7%
E 86
 
4.7%
S 85
 
4.7%
B 82
 
4.5%
C 72
 
4.0%
W 71
 
3.9%
R 62
 
3.4%
Other values (15) 453
25.0%
ValueCountFrequency (%)
M 558
30.9%
A 125
 
6.9%
H 113
 
6.3%
J 98
 
5.4%
S 95
 
5.3%
E 84
 
4.7%
C 83
 
4.6%
W 69
 
3.8%
D 62
 
3.4%
B 60
 
3.3%
Other values (15) 459
25.4%
Other Punctuation
ValueCountFrequency (%)
. 446
47.0%
, 446
47.0%
" 50
 
5.3%
' 6
 
0.6%
/ 1
 
0.1%
ValueCountFrequency (%)
, 446
47.2%
. 446
47.2%
" 50
 
5.3%
' 3
 
0.3%
Open Punctuation
ValueCountFrequency (%)
( 66
100.0%
ValueCountFrequency (%)
( 60
100.0%
Close Punctuation
ValueCountFrequency (%)
) 66
100.0%
ValueCountFrequency (%)
) 60
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9396
79.4%
Common 2435
 
20.6%
ValueCountFrequency (%)
Latin 9501
79.7%
Common 2424
 
20.3%

Most frequent character per script

Common
ValueCountFrequency (%)
1351
55.5%
. 446
 
18.3%
, 446
 
18.3%
( 66
 
2.7%
) 66
 
2.7%
" 50
 
2.1%
' 6
 
0.2%
- 3
 
0.1%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1354
55.9%
, 446
 
18.4%
. 446
 
18.4%
) 60
 
2.5%
( 60
 
2.5%
" 50
 
2.1%
- 5
 
0.2%
' 3
 
0.1%
Latin
ValueCountFrequency (%)
r 974
 
10.4%
e 872
 
9.3%
a 814
 
8.7%
s 655
 
7.0%
i 653
 
6.9%
n 645
 
6.9%
M 572
 
6.1%
l 514
 
5.5%
o 459
 
4.9%
t 321
 
3.4%
Other values (41) 2917
31.0%
ValueCountFrequency (%)
r 972
 
10.2%
e 856
 
9.0%
a 813
 
8.6%
i 672
 
7.1%
n 661
 
7.0%
s 643
 
6.8%
M 558
 
5.9%
l 533
 
5.6%
o 495
 
5.2%
t 322
 
3.4%
Other values (41) 2976
31.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11831
100.0%
ValueCountFrequency (%)
ASCII 11925
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1351
 
11.4%
r 974
 
8.2%
e 872
 
7.4%
a 814
 
6.9%
s 655
 
5.5%
i 653
 
5.5%
n 645
 
5.5%
M 572
 
4.8%
l 514
 
4.3%
o 459
 
3.9%
Other values (50) 4322
36.5%
ValueCountFrequency (%)
1354
 
11.4%
r 972
 
8.2%
e 856
 
7.2%
a 813
 
6.8%
i 672
 
5.6%
n 661
 
5.5%
s 643
 
5.4%
M 558
 
4.7%
l 533
 
4.5%
o 495
 
4.2%
Other values (49) 4368
36.6%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
289 
female
157 
male
296 
female
150 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.70403594.6726457
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20982084
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowmalemale
4th rowmalemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 296
66.4%
female 150
33.6%

Length

2024-01-08T15:05:31.595771image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-01-08T15:05:31.716708image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:31.818243image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 296
66.4%
female 150
33.6%

Most occurring characters

ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2098
100.0%
ValueCountFrequency (%)
Lowercase Letter 2084
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 2098
100.0%
ValueCountFrequency (%)
Latin 2084
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2098
100.0%
ValueCountFrequency (%)
ASCII 2084
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7374
Distinct (%)20.4%20.7%
Missing8888
Missing (%)19.7%19.7%
Infinite00
Infinite (%)0.0%0.0%
Mean29.36592229.526536
 Dataset ADataset B
Minimum0.670.42
Maximum7480
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:31.979310image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.670.42
5-th percentile44
Q12019.25
median2828
Q337.7538
95-th percentile5857.15
Maximum7480
Range73.3379.58
Interquartile range (IQR)17.7518.75

Descriptive statistics

 Dataset ADataset B
Standard deviation14.5924415.003293
Coefficient of variation (CV)0.496917470.50812913
Kurtosis0.178829390.3822428
Mean29.36592229.526536
Median Absolute Deviation (MAD)89
Skewness0.4298380.50290276
Sum1051310570.5
Variance212.93929225.09881
MonotonicityNot monotonicNot monotonic
2024-01-08T15:05:32.190774image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 15
 
3.4%
22 14
 
3.1%
21 14
 
3.1%
19 13
 
2.9%
24 13
 
2.9%
18 13
 
2.9%
25 12
 
2.7%
29 11
 
2.5%
28 11
 
2.5%
27 11
 
2.5%
Other values (63) 231
51.8%
(Missing) 88
 
19.7%
ValueCountFrequency (%)
25 16
 
3.6%
19 15
 
3.4%
18 13
 
2.9%
28 13
 
2.9%
16 13
 
2.9%
24 12
 
2.7%
27 12
 
2.7%
22 12
 
2.7%
26 11
 
2.5%
32 11
 
2.5%
Other values (64) 230
51.6%
(Missing) 88
 
19.7%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 4
0.9%
4 6
1.3%
5 3
0.7%
7 2
 
0.4%
8 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 4
0.9%
6 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 4
0.9%
6 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 4
0.9%
4 6
1.3%
5 3
0.7%
7 2
 
0.4%
8 3
0.7%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.520179370.56502242
 Dataset ADataset B
Minimum00
Maximum88
Zeros309298
Zeros (%)69.3%66.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:32.348184image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile23
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.13109111.156918
Coefficient of variation (CV)2.17442522.0475612
Kurtosis18.1770916.229817
Mean0.520179370.56502242
Median Absolute Deviation (MAD)00
Skewness3.7637173.5364268
Sum232252
Variance1.27936721.3384592
MonotonicityNot monotonicNot monotonic
2024-01-08T15:05:32.471584image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 309
69.3%
1 99
 
22.2%
2 16
 
3.6%
4 11
 
2.5%
3 5
 
1.1%
8 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 298
66.8%
1 104
 
23.3%
2 20
 
4.5%
4 10
 
2.2%
3 7
 
1.6%
8 4
 
0.9%
5 3
 
0.7%
ValueCountFrequency (%)
0 309
69.3%
1 99
 
22.2%
2 16
 
3.6%
3 5
 
1.1%
4 11
 
2.5%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 298
66.8%
1 104
 
23.3%
2 20
 
4.5%
3 7
 
1.6%
4 10
 
2.2%
5 3
 
0.7%
8 4
 
0.9%
ValueCountFrequency (%)
0 298
66.8%
1 104
 
23.3%
2 20
 
4.5%
3 7
 
1.6%
4 10
 
2.2%
5 3
 
0.7%
8 4
 
0.9%
ValueCountFrequency (%)
0 309
69.3%
1 99
 
22.2%
2 16
 
3.6%
3 5
 
1.1%
4 11
 
2.5%
5 2
 
0.4%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.383408070.43049327
 Dataset ADataset B
Minimum00
Maximum55
Zeros336332
Zeros (%)75.3%74.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:32.592392image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q301
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)01

Descriptive statistics

 Dataset ADataset B
Standard deviation0.772326440.88094088
Coefficient of variation (CV)2.01437192.0463523
Kurtosis7.35537218.1877072
Mean0.383408070.43049327
Median Absolute Deviation (MAD)00
Skewness2.42325832.6097847
Sum171192
Variance0.596488130.77605683
MonotonicityNot monotonicNot monotonic
2024-01-08T15:05:32.712197image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 45
 
10.1%
5 2
 
0.4%
4 2
 
0.4%
3 1
 
0.2%
ValueCountFrequency (%)
0 332
74.4%
1 59
 
13.2%
2 44
 
9.9%
5 5
 
1.1%
3 4
 
0.9%
4 2
 
0.4%
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 45
 
10.1%
3 1
 
0.2%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 332
74.4%
1 59
 
13.2%
2 44
 
9.9%
3 4
 
0.9%
4 2
 
0.4%
5 5
 
1.1%
ValueCountFrequency (%)
0 332
74.4%
1 59
 
13.2%
2 44
 
9.9%
3 4
 
0.9%
4 2
 
0.4%
5 5
 
1.1%
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 45
 
10.1%
3 1
 
0.2%
4 2
 
0.4%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct382382
Distinct (%)85.7%85.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:33.126223image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.78026916.7242152
Min length34

Characters and Unicode

 Dataset ADataset B
Total characters30242999
Distinct characters3531
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique333332 ?
Unique (%)74.7%74.4%

Sample

 Dataset ADataset B
1st row693WE/P 5735
2nd rowA/4. 39886218629
3rd row345774PC 17757
4th row347077347076
5th row29106350046
ValueCountFrequency (%)
pc 33
 
5.7%
c.a 17
 
3.0%
a/5 8
 
1.4%
ca 7
 
1.2%
1601 5
 
0.9%
ston/o 5
 
0.9%
2 5
 
0.9%
sc/paris 5
 
0.9%
ston/o2 5
 
0.9%
a/4 5
 
0.9%
Other values (407) 479
83.4%
ValueCountFrequency (%)
pc 28
 
5.0%
c.a 14
 
2.5%
a/5 8
 
1.4%
ca 7
 
1.3%
347082 6
 
1.1%
w./c 6
 
1.1%
sc/paris 5
 
0.9%
3101295 4
 
0.7%
2 4
 
0.7%
ston/o 4
 
0.7%
Other values (400) 473
84.6%
2024-01-08T15:05:33.763420image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 361
11.9%
1 343
11.3%
2 296
9.8%
7 249
 
8.2%
4 228
 
7.5%
6 214
 
7.1%
0 205
 
6.8%
5 198
 
6.5%
8 153
 
5.1%
9 153
 
5.1%
Other values (25) 624
20.6%
ValueCountFrequency (%)
3 367
12.2%
1 353
11.8%
2 296
9.9%
7 250
8.3%
4 236
7.9%
6 213
 
7.1%
0 204
 
6.8%
5 183
 
6.1%
9 164
 
5.5%
8 154
 
5.1%
Other values (21) 579
19.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2400
79.4%
Uppercase Letter 323
 
10.7%
Other Punctuation 156
 
5.2%
Space Separator 128
 
4.2%
Lowercase Letter 17
 
0.6%
ValueCountFrequency (%)
Decimal Number 2420
80.7%
Uppercase Letter 312
 
10.4%
Other Punctuation 146
 
4.9%
Space Separator 113
 
3.8%
Lowercase Letter 8
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 361
15.0%
1 343
14.3%
2 296
12.3%
7 249
10.4%
4 228
9.5%
6 214
8.9%
0 205
8.5%
5 198
8.2%
8 153
6.4%
9 153
6.4%
ValueCountFrequency (%)
3 367
15.2%
1 353
14.6%
2 296
12.2%
7 250
10.3%
4 236
9.8%
6 213
8.8%
0 204
8.4%
5 183
7.6%
9 164
6.8%
8 154
6.4%
Space Separator
ValueCountFrequency (%)
128
100.0%
ValueCountFrequency (%)
113
100.0%
Other Punctuation
ValueCountFrequency (%)
. 108
69.2%
/ 48
30.8%
ValueCountFrequency (%)
. 100
68.5%
/ 46
31.5%
Uppercase Letter
ValueCountFrequency (%)
C 86
26.6%
P 54
16.7%
A 46
14.2%
O 43
13.3%
S 37
11.5%
N 16
 
5.0%
T 15
 
4.6%
W 6
 
1.9%
F 5
 
1.5%
I 4
 
1.2%
Other values (6) 11
 
3.4%
ValueCountFrequency (%)
C 75
24.0%
P 49
15.7%
O 47
15.1%
A 39
12.5%
S 34
10.9%
N 18
 
5.8%
T 17
 
5.4%
W 10
 
3.2%
Q 7
 
2.2%
I 5
 
1.6%
Other values (4) 11
 
3.5%
Lowercase Letter
ValueCountFrequency (%)
a 5
29.4%
s 4
23.5%
r 3
17.6%
i 3
17.6%
l 1
 
5.9%
e 1
 
5.9%
ValueCountFrequency (%)
a 2
25.0%
r 2
25.0%
i 2
25.0%
s 2
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2684
88.8%
Latin 340
 
11.2%
ValueCountFrequency (%)
Common 2679
89.3%
Latin 320
 
10.7%

Most frequent character per script

Common
ValueCountFrequency (%)
3 361
13.5%
1 343
12.8%
2 296
11.0%
7 249
9.3%
4 228
8.5%
6 214
8.0%
0 205
7.6%
5 198
7.4%
8 153
5.7%
9 153
5.7%
Other values (3) 284
10.6%
ValueCountFrequency (%)
3 367
13.7%
1 353
13.2%
2 296
11.0%
7 250
9.3%
4 236
8.8%
6 213
8.0%
0 204
7.6%
5 183
6.8%
9 164
6.1%
8 154
5.7%
Other values (3) 259
9.7%
Latin
ValueCountFrequency (%)
C 86
25.3%
P 54
15.9%
A 46
13.5%
O 43
12.6%
S 37
10.9%
N 16
 
4.7%
T 15
 
4.4%
W 6
 
1.8%
F 5
 
1.5%
a 5
 
1.5%
Other values (12) 27
 
7.9%
ValueCountFrequency (%)
C 75
23.4%
P 49
15.3%
O 47
14.7%
A 39
12.2%
S 34
10.6%
N 18
 
5.6%
T 17
 
5.3%
W 10
 
3.1%
Q 7
 
2.2%
I 5
 
1.6%
Other values (8) 19
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3024
100.0%
ValueCountFrequency (%)
ASCII 2999
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 361
11.9%
1 343
11.3%
2 296
9.8%
7 249
 
8.2%
4 228
 
7.5%
6 214
 
7.1%
0 205
 
6.8%
5 198
 
6.5%
8 153
 
5.1%
9 153
 
5.1%
Other values (25) 624
20.6%
ValueCountFrequency (%)
3 367
12.2%
1 353
11.8%
2 296
9.9%
7 250
8.3%
4 236
7.9%
6 213
 
7.1%
0 204
 
6.8%
5 183
 
6.1%
9 164
 
5.5%
8 154
 
5.1%
Other values (21) 579
19.3%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct181183
Distinct (%)40.6%41.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.65697832.819955
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros65
Zeros (%)1.3%1.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:34.047560image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.22927.2292
Q17.89587.8958
median14.2541514.4542
Q33131.275
95-th percentile118.31875118.31875
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.104223.3792

Descriptive statistics

 Dataset ADataset B
Standard deviation54.52569452.5157
Coefficient of variation (CV)1.62004131.600115
Kurtosis30.00998934.233795
Mean33.65697832.819955
Median Absolute Deviation (MAD)6.718756.7667
Skewness4.66700454.9406518
Sum15011.01214637.7
Variance2973.05132757.8988
MonotonicityNot monotonicNot monotonic
2024-01-08T15:05:34.325678image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 20
 
4.5%
7.8958 19
 
4.3%
10.5 18
 
4.0%
7.75 17
 
3.8%
8.05 17
 
3.8%
26 13
 
2.9%
7.775 11
 
2.5%
7.25 10
 
2.2%
7.925 10
 
2.2%
7.2292 9
 
2.0%
Other values (171) 302
67.7%
ValueCountFrequency (%)
7.8958 21
 
4.7%
13 20
 
4.5%
7.75 16
 
3.6%
8.05 15
 
3.4%
26 15
 
3.4%
10.5 13
 
2.9%
7.775 11
 
2.5%
7.8542 9
 
2.0%
7.925 9
 
2.0%
7.2292 8
 
1.8%
Other values (173) 309
69.3%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.125 2
 
0.4%
7.225 4
0.9%
ValueCountFrequency (%)
0 5
1.1%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.225 5
1.1%
ValueCountFrequency (%)
0 5
1.1%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.225 5
1.1%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.125 2
 
0.4%
7.225 4
0.9%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8788
Distinct (%)87.0%86.3%
Missing346344
Missing (%)77.6%77.1%
Memory size7.0 KiB7.0 KiB
2024-01-08T15:05:34.778906image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.733.4901961
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters373356
Distinct characters1919
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7577 ?
Unique (%)75.0%75.5%

Sample

 Dataset ADataset B
1st rowD35B22
2nd rowA20B94
3rd rowTE17
4th rowC95C110
5th rowB96 B98E33
ValueCountFrequency (%)
b96 3
 
2.5%
f 3
 
2.5%
b98 3
 
2.5%
b5 2
 
1.7%
b49 2
 
1.7%
e25 2
 
1.7%
c27 2
 
1.7%
c25 2
 
1.7%
c23 2
 
1.7%
g73 2
 
1.7%
Other values (88) 97
80.8%
ValueCountFrequency (%)
g6 4
 
3.3%
b98 3
 
2.5%
b96 3
 
2.5%
f 3
 
2.5%
d 2
 
1.7%
c27 2
 
1.7%
c25 2
 
1.7%
c23 2
 
1.7%
b18 2
 
1.7%
e101 2
 
1.7%
Other values (89) 95
79.2%
2024-01-08T15:05:35.389539image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 38
 
10.2%
2 33
 
8.8%
3 29
 
7.8%
C 29
 
7.8%
1 27
 
7.2%
6 26
 
7.0%
4 22
 
5.9%
5 21
 
5.6%
8 21
 
5.6%
20
 
5.4%
Other values (9) 107
28.7%
ValueCountFrequency (%)
B 35
 
9.8%
2 31
 
8.7%
C 30
 
8.4%
3 30
 
8.4%
6 24
 
6.7%
1 24
 
6.7%
7 22
 
6.2%
8 20
 
5.6%
5 20
 
5.6%
18
 
5.1%
Other values (9) 102
28.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 233
62.5%
Uppercase Letter 120
32.2%
Space Separator 20
 
5.4%
ValueCountFrequency (%)
Decimal Number 218
61.2%
Uppercase Letter 120
33.7%
Space Separator 18
 
5.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 38
31.7%
C 29
24.2%
D 17
14.2%
E 15
 
12.5%
F 9
 
7.5%
A 8
 
6.7%
G 3
 
2.5%
T 1
 
0.8%
ValueCountFrequency (%)
B 35
29.2%
C 30
25.0%
D 16
13.3%
E 15
12.5%
F 8
 
6.7%
A 8
 
6.7%
G 7
 
5.8%
T 1
 
0.8%
Decimal Number
ValueCountFrequency (%)
2 33
14.2%
3 29
12.4%
1 27
11.6%
6 26
11.2%
4 22
9.4%
5 21
9.0%
8 21
9.0%
9 19
8.2%
7 18
7.7%
0 17
7.3%
ValueCountFrequency (%)
2 31
14.2%
3 30
13.8%
6 24
11.0%
1 24
11.0%
7 22
10.1%
8 20
9.2%
5 20
9.2%
9 17
7.8%
0 16
7.3%
4 14
6.4%
Space Separator
ValueCountFrequency (%)
20
100.0%
ValueCountFrequency (%)
18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 253
67.8%
Latin 120
32.2%
ValueCountFrequency (%)
Common 236
66.3%
Latin 120
33.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 38
31.7%
C 29
24.2%
D 17
14.2%
E 15
 
12.5%
F 9
 
7.5%
A 8
 
6.7%
G 3
 
2.5%
T 1
 
0.8%
ValueCountFrequency (%)
B 35
29.2%
C 30
25.0%
D 16
13.3%
E 15
12.5%
F 8
 
6.7%
A 8
 
6.7%
G 7
 
5.8%
T 1
 
0.8%
Common
ValueCountFrequency (%)
2 33
13.0%
3 29
11.5%
1 27
10.7%
6 26
10.3%
4 22
8.7%
5 21
8.3%
8 21
8.3%
20
7.9%
9 19
7.5%
7 18
7.1%
ValueCountFrequency (%)
2 31
13.1%
3 30
12.7%
6 24
10.2%
1 24
10.2%
7 22
9.3%
8 20
8.5%
5 20
8.5%
18
7.6%
9 17
7.2%
0 16
6.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 373
100.0%
ValueCountFrequency (%)
ASCII 356
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 38
 
10.2%
2 33
 
8.8%
3 29
 
7.8%
C 29
 
7.8%
1 27
 
7.2%
6 26
 
7.0%
4 22
 
5.9%
5 21
 
5.6%
8 21
 
5.6%
20
 
5.4%
Other values (9) 107
28.7%
ValueCountFrequency (%)
B 35
 
9.8%
2 31
 
8.7%
C 30
 
8.4%
3 30
 
8.4%
6 24
 
6.7%
1 24
 
6.7%
7 22
 
6.2%
8 20
 
5.6%
5 20
 
5.6%
18
 
5.1%
Other values (9) 102
28.7%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing10
Missing (%)0.2%0.0%
Memory size7.0 KiB7.0 KiB
S
318 
C
88 
Q
39 
S
319 
C
87 
Q
40 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSC
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 318
71.3%
C 88
 
19.7%
Q 39
 
8.7%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 319
71.5%
C 87
 
19.5%
Q 40
 
9.0%

Length

2024-01-08T15:05:35.552432image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-01-08T15:05:35.661529image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:35.772225image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
s 318
71.5%
c 88
 
19.8%
q 39
 
8.8%
ValueCountFrequency (%)
s 319
71.5%
c 87
 
19.5%
q 40
 
9.0%

Most occurring characters

ValueCountFrequency (%)
S 318
71.5%
C 88
 
19.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.5%
C 87
 
19.5%
Q 40
 
9.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 445
100.0%
ValueCountFrequency (%)
Uppercase Letter 446
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 318
71.5%
C 88
 
19.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.5%
C 87
 
19.5%
Q 40
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
100.0%
ValueCountFrequency (%)
Latin 446
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 318
71.5%
C 88
 
19.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.5%
C 87
 
19.5%
Q 40
 
9.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 445
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 318
71.5%
C 88
 
19.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.5%
C 87
 
19.5%
Q 40
 
9.0%

Interactions

Dataset A

2024-01-08T15:05:23.696650image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.615281image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:21.090273image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:24.967527image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:21.798477image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.569001image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.424167image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:26.206917image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:23.075991image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:26.996669image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:23.813190image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.730773image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:21.208928image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.068953image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:21.918087image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.690544image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.549342image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:26.331591image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:23.191304image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.111431image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:23.941864image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.862910image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:21.423634image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.201018image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.052325image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.828841image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.676492image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:26.465520image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:23.320600image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.244653image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:24.078733image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.998343image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:21.559467image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.334520image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.175277image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.954999image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.819556image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:26.609401image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:23.456871image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.378635image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:24.199764image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:28.117359image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:21.679219image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:25.452367image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.301028image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:26.080956image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:22.947020image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:26.735341image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2024-01-08T15:05:23.576155image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2024-01-08T15:05:27.496868image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Missing values

Dataset A

2024-01-08T15:05:24.381854image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-01-08T15:05:28.296536image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-01-08T15:05:24.645268image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-01-08T15:05:28.556248image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
54554601Nicholson, Mr. Arthur Ernestmale64.000069326.0000NaNS
515203Nosworthy, Mr. Richard Catermale21.0000A/4. 398867.8000NaNS
28628713de Mulder, Mr. Theodoremale30.00003457749.5000NaNS
18218303Asplund, Master. Clarence Gustaf Hugomale9.004234707731.3875NaNS
83183212Richards, Master. George Sibleymale0.83112910618.7500NaNS
52552603Farrell, Mr. Jamesmale40.50003672327.7500NaNQ
33033113McCoy, Miss. AgnesfemaleNaN2036722623.2500NaNQ
87087103Balkic, Mr. Cerinmale26.00003492487.8958NaNS
24824911Beckwith, Mr. Richard Leonardmale37.00111175152.5542D35S
59960011Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")male49.0010PC 1748556.9292A20C

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
74574601Crosby, Capt. Edward Giffordmale70.011WE/P 573571.0000B22S
56256302Norman, Mr. Robert Douglasmale28.00021862913.5000NaNS
55755801Robbins, Mr. VictormaleNaN00PC 17757227.5250NaNC
44244303Petterson, Mr. Johan Emilmale25.0103470767.7750NaNS
19219313Andersen-Jensen, Miss. Carla Christine Nielsinefemale19.0103500467.8542NaNS
26326401Harrison, Mr. Williammale40.0001120590.0000B94S
84584603Abbing, Mr. Anthonymale42.000C.A. 55477.5500NaNS
67767813Turja, Miss. Anna Sofiafemale18.00041389.8417NaNS
65165212Doling, Miss. Elsiefemale18.00123191923.0000NaNS
28528603Stankovic, Mr. Ivanmale33.0003492398.6625NaNC

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
53954011Frolicher, Miss. Hedwig Margarithafemale22.0021356849.5000B39C
60961011Shutes, Miss. Elizabeth Wfemale40.000PC 17582153.4625C125S
808103Waelens, Mr. Achillemale22.0003457679.0000NaNS
34734813Davison, Mrs. Thomas Henry (Mary E Finck)femaleNaN1038652516.1000NaNS
16816901Baumann, Mr. John DmaleNaN00PC 1731825.9250NaNS
55355413Leeni, Mr. Fahim ("Philip Zenni")male22.00026207.2250NaNC
76276313Barah, Mr. Hanna Assimale20.00026637.2292NaNC
70070111Astor, Mrs. John Jacob (Madeleine Talmadge Force)female18.010PC 17757227.5250C62 C64C
86086103Hansen, Mr. Claus Petermale41.02035002614.1083NaNS
26726813Persson, Mr. Ernst Ulrikmale25.0103470837.7750NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
65966001Newell, Mr. Arthur Webstermale58.00235273113.2750D48C
49349401Artagaveytia, Mr. Ramonmale71.000PC 1760949.5042NaNC
40840903Birkeland, Mr. Hans Martin Monsenmale21.0003129927.7750NaNS
555611Woolner, Mr. HughmaleNaN001994735.5000C52S
24624703Lindahl, Miss. Agda Thorilda Viktoriafemale25.0003470717.7750NaNS
35535603Vanden Steen, Mr. Leo Petermale28.0003457839.5000NaNS
26626703Panula, Mr. Ernesti Arvidmale16.041310129539.6875NaNS
60460511Homer, Mr. Harry ("Mr E Haven")male35.00011142626.5500NaNC
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC
73473502Troupiansky, Mr. Moses Aaronmale23.00023363913.0000NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.